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I. INTRODUCTION 

Automata theory is the study of abstract computing 
devices, or machines, and the class of functions they can 
perform on their inputs. In the 1940's and 1950's, simple 
kinds of machines, so-called finite-state automata, were 
introduced to model brain function [US]. They turned 
out to be extremely useful for a variety of other pur- 
poses, such as studying the lower limits of computational 
power and synthesizing logic controllers and communi- 
cation networks. In the late 1950's, the linguist Noam 
Chomsky developed a classification of formal languages 
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in terms of the grammars and automata required to rec- 
ognize them [3j. On the lowest level of Chomsky's hi- 
erarchy, for example, whether or not a given sentence 
obeys the grammatical rules of a language is answered 
by a finite-state automaton. 

Our understanding of the nature of computing has 
changed substantially in the intervening half century In 
recent years the study of computation with elementary 
components that obey quantum mechanical laws has de- 
veloped into a highly active research area. 

A. Finite Quantum Computing 

The physical laws underlying quantum computation 
are a mixed blessing. On the one hand, a growing body 
of theoretical results suggests that a computational de- 
vice whose components are directly governed by quantum 
physics may be considerably more powerful than its clas- 
sical counterpart. Undoubtedly, the most celebrated of 
these results is Shor's factoring algorithm from 1994 
Other results include Grover's quantum search algorithm 
from 1996 5]. On the other hand, the results employ 
powerful computational architectures, such as quantum 
Turing machines Q, that are decidedly more powerful 
than finite-state machines and that must maintain high 
degrees of internal coherence and environmental isolation 
during operation. For a review of theoretical and experi- 
mental studies of quantum computation see, for example, 
Refs. 0,|. 

However, to date, implementation efforts have fallen 
substantially short of the theoretical promise. So far ex- 
perimental tests of quantum computation are on small- 
scale systems — in fact, very small. Currently, the largest 
coherent system of information storage supports only 7 
quantum bits or qubits [9j. Thus, the study of finite-state 
quantum automata is motivated by very practical con- 
cerns. They reflect the capabilities of currently feasible 
quantum computers. As was also true in the first days of 
digital computers, though, the study of finite machines is 
also a starting point, here for developing a computational 
hierarchy for quantum dynamical systems. 

B. Dynamics, Information, and Measurement 

A common goal in the practice of quantum theory is to 
predict the expectation of outcomes from an ensemble of 
isolated measurements. There is a key difference, though, 
between this and what one needs to understand quan- 
tum processes. For quantum processes, such as found in 
molecular dynamics, one must analyze behavior; predict- 
ing an observable's mean value is insufficient. 

Quantum mechanics can be extended, of course, to ad- 
dress behavior. This has been done in rather general 
frameworks (e.g., Ref. (To)), as well as in special cases, 
such as quantum Markov chains [ll[. However, ques- 
tions about a quantum system's information processing 



capacity remain unanswered. For example, how much of 
a quantum system's history is stored in its state? How 
is that information processed to produce future behav- 
ior? More pointedly, even if a system is designed to have 
a desired information processing capacity, the question 
always remains whether or not that capacity is actually 
used during operation. 

An intriguing, but seemingly unrelated area of research 
in quantum behavior is quantum chaos — the production 
of information through the exponential amplification of 
perturbations Since any quantum system is de- 

scribed by the Schrodinger equation, which is linear, 
chaotic behavior cannot arise. However, quantum sys- 
tems that exhibit chaotic behavior in the classical limit 
can show signatures of chaos in semi-classical regimes. 
Thus, analyzing the relationship between classical and 
quantum dynamical systems plays an important role in 
understanding the origins of quantum information pro- 
duction. 

For quantum systems, in contrast with their classi- 
cal counterparts, including measurement interactions is 
essential to any complete description. Unfortunately, 
this is largely missing from current dynamical theories. 
Nonetheless, simulation studies show that measurement 
interactions lead to genuinely chaotic behavior in quan- 
tum dynamical systems, even far from the semi-classical 
limit |l3|. Observation must be the basis for modeling 
a quantum process — either in describing its behavior or 
quantifying its computational capacity. 



C. Technical Setting 

Here we introduce finite computation-theoretic quan- 
tum models that, when analyzed with tools from quan- 
tum mechanics and stochastic processes, simultaneously 
embody dynamics, measurement, and information pro- 
cessing. Studies of quantum chaos are, in effect, exten- 
sions of the theory of nonlinear (classical) dynamics. Dy- 
namical systems are often analyzed by transforming them 
into finite-state automata using the methods of symbolic 
dynamics [l4T |. The quantum automata in the following 
model dynamical behavior and include measurement in- 
teractions and so provide a kind of symbolic dynamics for 
quantum systems [l5| . The result is a line of inquiry com- 
plementary to both quantum computation and quantum 
dynamical systems. 

One goal is to develop a representation of quantum 
processes that allows one to analyze their intrinsic com- 
putation. Intrinsic computation in a dynamical system 
is an inherent property of the behavior the system gen- 
erates [16J . One asks three basic questions of the system: 
First, how much historical information is stored in the 
current state? Second, in what architecture is that in- 
formation stored? Finally, how is the stored information 
transformed to produce future behavior? This approach 
has been used to analyze intrinsic computation in clas- 
sical dynamical systems, statistical mechanical systems, 
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and stochastic processes [IE EE EE HE| ■ 

We view the present contribution as a direct extension 
of this prior work and, also, as complementary to the 
current design and theoretical-engineering approach to 
quantum computation. Specifically, we focus on the dy- 
namics of quantum processes, rather than on methods to 
construct devices that implement a desired function. We 
express the intrinsic information processing using various 
kinds of finite-memory devices. We emphasize the effects 
of measurement on a quantum system's behavior and so, 
in this way, provide a somewhat different view of quan- 
tum dynamical systems for which, typically, observation 
is ignored. An information-theoretic analysis using the 
resulting framework can be found in Refs. fl5l. [2l|. 

Most directly, we are interested, as natural scientists 
are, in behavior — how a system state develops over time. 
In the computation-theoretic setting this translates into 
a need to model generators. In contrast, the conven- 
tional setting for analyzing the computational power of 
automata centers around detecting membership of words 
in a language. As a consequence, the overwhelming frac- 
tion of existing results on automata concerns devices that 
recognize an input string — and on problems that can be 
recast as such. Automata that spontaneously generate 
outputs are much less often encountered, if at all, in 
the theory of computation. Nonetheless, generators are 
necessary if one wants to model physical processes using 
dynamical systems. In particular, as we hope to show, 
quantum generators are a key tool for detecting informa- 
tion processing capabilities inherent in natural processes. 



D. Overview 

Due to the range of topics, in the following we give a 
selective, but self-contained treatment. We review what 
is needed from automata, formal languages, and quan- 
tum theory, though familiarity with those areas is helpful. 
Citations to reference texts are given at the appropriate 
points. 

Our approach will make most sense, especially to those 
unfamiliar with the theory of formal languages, if we de- 
vote some time to reviewing basic automata theory and 
its original goals. This also allows us to establish, in a 
graded fashion, the necessary notation for the full devel- 
opment, clearly identifying which properties are quantum 
mechanical and which, in contrast, are essentially classi- 
cal (and probabilistic). In addition, this illustrates one 
of the principle benefits of discrete computation theory: 
i.e., the classification of devices that implement differ- 
ent kinds of information processing. Those for whom 
automata and formal languages are well known, though, 
should appreciate by the end of the review the physical 
and dynamical motivations, since these will be expressed 
within the existing frameworks of discrete computation 
and stochastic processes. 

To lay the foundations for a computational perspective 
on quantum dynamical systems the most basic notion we 



introduce is the class of finite-state automata called quan- 
tum finite-state transducers. To get to these, in the next 
sections we introduce the concept of process languages, 
building on formal language theory. We then present 
stochastic finite-state transducers and their subclasses — 
stochastic recognizers and generators — as classical rep- 
resentations of process languages. The relationship be- 
tween automata and languages is discussed in each case 
and we provide an overview (and introduce notation) that 
anticipates their quantum analogs. We then introduce 
quantum finite-state transducers and their subclasses — 
quantum recognizers and generators — and discuss their 
various properties. Finally, we illustrate the main ideas 
by analyzing specific examples of quantum dynamical 
systems that they can model. 

II. FINITARY STOCHASTIC PROCESSES 

Consider the temporal evolution of the state of some 
natural system. The evolution is monitored by a series of 
measurements — numbers registered in some way, perhaps 
continuously, perhaps discretely. Each such measurement 
can be taken as a random variable. The distribution 
over sequences of these random variables is what we refer 
to as a stochastic process. An important question for 
understanding the structure of natural systems is what 
kinds of stochastic processes there are. 

The class of finitary stochastic processes was intro- 
duced to identify those that require only a finite amount 
of internal resources to generate their behavior. This 
property is important in several settings. In symbolic 
dynamical systems, for example, it was shown that the 
sofic subshifts have a form of infinite correlation in their 
temporal behaviors despite being finitely specified [2^ |. 
The information-theoretic characterization of stochastic 
processes [IE HE HH, as another example, defines fini- 
tary processes as those with a bounded value of mutual 
information between past and future behaviors. Here, 
we remain close to these original definitions, giving ex- 
plicit structural models, both classical and quantum, for 
finitary processes. 

In this, we use formal language theory. Our use of for- 
mal language theory differs from most, though, in how it 
analyzes the connection between a language and the sys- 
tems that can generate it. In brief, we observe a system 
through a finite-resolution measuring instrument, repre- 
senting each output with a symbol a from discrete al- 
phabet E. The temporal behavior of a system, then, is 
a string or a word consisting of a succession of measure- 
ment symbols. The collection of all (and only) those 
words is the language that captures the possible, tempo- 
ral behaviors of the system. 

Definition. A formal language £ is a set of words 
w = <7o<7i<72 • • • eac h °f which consists of a finite series 
of symbols er t G T, from a discrete alphabet S. 

In the following A denotes the empty word. S* denotes 
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the set of all possible words, including A, of any length 
formed using symbols in E. We denote a word of length 
L by <j l = <7q<j\ . . ,(Tl-x, with at G E. The set of all 
words of length L is E L . 

Since a formal language, as we use the term, is a set 
of observed words generated by a process, then each sub- 
word OtOt+i ■ ■ ■ c„-i<7 u , t < u, t,u — 0, 1, 1, of 
a word a L has also been observed and is considered part 
of the language. This leads to the following definition. 

Definition. A language C is subword closed if, for each 
w G C, all of w 's subwords sub(ui) are also members of 
C: sub(w) C C 

Finally, we imagine that a system can run for an arbi- 
trarily long time and so the language describing its be- 
haviors has words of arbitrary length. In this way, a 
subword-closed formal language — as a set of arbitrarily 
long series of measurement outcomes — represents the al- 
lowed (and, implicitly, disallowed) behaviors of a system. 

Beyond a formal language listing which words (or be- 
haviors) occur and which do not, we are also interested 
in the probability of their occurrence. Let Pr(w) denote 
the probability of word w, then we have the following. 

Definition. A stochastic language S is a formal lan- 
guage with a word distribution Pr(ty) that is normalized 
at each length L: 

£Pr(«;)=l ,L= 1,2,3,... (1) 

{wGZ L } 

with < Pr(w) < 1 . 

Definition. The joint probability of symbol a following 
word w is written Pr(wcr). 

Definition. The conditional probability Pr(cr|io) of 
symbol a given the preceding observation of word w is 

Pr(cr|w) = Pr(w)/Pr(u;) , (2) 

when Pr(«;) > 0; otherwise, Pr(er|w) = 0. 

For purposes of comparison between various computa- 
tional models, it is helpful to refer directly to the set of 
words in a stochastic language S. This is the support of 
a stochastic language: 

supp (S) = {w G S : Pr(io) > 0} . (3) 

These lead us, finally, to define the main object of 
study. 

Definition. A process language V is a stochastic lan- 
guage that is subword closed and it obeys the consistency 
condition Pr(cr L ) > Pi(a L a). 

A process language represents all of a system's possible 
behaviors, w G supp (V), and their probabilities Pr(io) 
of occurrence. In its completeness it could be taken as a 
model of the system, but at best it is a rather prosaic and 



unwieldy representation. Indeed, a model of a process is 
usually intended to be a more compact description than 
a literal listing of observations. In the best of circum- 
stances a model's components capture some aspect of a 
system's structure and organization. Here we will be even 
more specific, the models that we will focus on not only 
have to describe a process language, but they will also 
consist of two structural components: states and transi- 
tions between them. (One should contrast the seeming 
obviousness of the latter with the fact that there are alter- 
native computational models, such as grammars, which 
do not use the concept of state.) 

To illustrate process languages we give an example 
in Fig. [TJ which shows a language — from the Golden 
Mean Process — and its word distribution at different 
word lengths. In this process language E = {0, 1} and 
word 00 and all words containing it have zero probability. 
Moreover, if a 1 is seen, then the next a G E occurs with 
fair probability. 

Figure Q] plots the base-2 logarithm of the word prob- 
abilities versus the binary string cr L , represented as the 
base-2 real number 0.a L = J2t=d < T t 2 ^ 1 € [Ml- At 
length L = 1 (upper leftmost plot) both words and 1 
are allowed but have different probabilities. At L — 2 the 
first disallowed string 00 occurs. As L grows an increas- 
ing number of words are forbidden — those containing the 
shorter forbidden word 00. As L — > oo the set of allowed 
words forms a self-similar, uncountable, closed, and dis- 
connected (Cantor) set in the interval [0,1] [14| . Note 
that the language is subword closed. The process's name 
comes from the fact that the logarithm of the number of 
allowed words grows exponentially with L at a rate given 
by the logarithm of the golden mean = i(l + 



III. STOCHASTIC TRANSDUCERS 

The process languages developed above require a new 
kind of finite-state machine to represent them. And so, 
our immediate goal is to construct a consistent formalism 
for machines that can recognize, generate, and transform 
process languages. We refer to the most general ones 
as stochastic transducers. We will then specialize these 
transducers into recognizers and generators. 

A few comments on various kinds of stochastic trans- 
ducer introduced by others will help to motivate our 
approach, which has the distinct goal of representing 
process languages. Paz defines stochastic sequential ma- 
chines that are, in effect, transducers [261 ]. Rabin defines 
probabilistic automata that are stochastic sequential ma- 
chines with no output (2?j]. Neither, though, considers 
process languages or the "generation" of any language 
for that matter. Vidal et al define stochastic transduc- 
ers, thou gh b ased on a different definition of stochastic 
language [28|]. As a result, their stochastic transducers 
cannot represent process languages. 
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FIG. 1: Example of a process language: In the Golden Mean 
Process, with alphabet E = {0, 1}, word 00 and all words con- 
taining it have zero probability. All other words have nonzero 
probability. The logarithm base 2 of the word probabilities 
is plotted versus the binary string a L , represented as base-2 
real number u 0.a L " . To allow word probabilities to be com- 
pared at different lengths, the distribution is normalized on 
[0, 1] — that is, the probabilities are calculated as densities. 

A. Definition 

Our definition of a stochastic transducer parallels Paz's 
stochastic sequential machines. 

Definition. A stochastic finite-state transducer (ST) is 
a tuple {S, X, Y, {T(y\x)}} where 

1. S is a finite set of states, including a start state s$. 

2. X and Y are finite alphabets of input and output 
symbols, respectively. 

3. {T(y\x) : x 6 X, y G Y} is a set of square sub- 
stochastic matrices o/dim \S\, one for each output- 
input pair y\x. The matrix entry Tij(y\x) is the 
conditional probability, when in state i and reading 
in symbol x, of going to state j and emitting symbol 

y- 

Generally, a stochastic transducer (ST) operates by 
reading in symbols that, along with the current state, 
determine the next state(s) and output symbol(s). At 
each step a symbol x G X is read from the input 
word. The transducer stochastically chooses a transition 
Tij(y\x) > 0, emits symbol y G Y, and updates its state 
from i to j. An ST thus maps an input word to one or 
more output words. Unless otherwise explicitly stated, 
in our models there is no delay between reading an input 
symbol and producing the associated output symbols. 

STs are our most general model of finitary (and non- 
quantum) computation. They are structured so that spe- 
cialization leads to a graded family of models of increas- 
ing sophistication. 



B. Graphical Representation 

The set {T(y\x)} can be represented as a directed 
graph G(T) with the nodes corresponding to states — 
the matrix row and column indices. An edge connects 
nodes i and j and corresponds to an element T L j > 
that gives the nonzero transition probability from state i 
to state j. Edges are labeled x\p\y with the input symbol 
x G X, output symbol y € Y, and transition probabil- 
ity p — Tij(y\x). Since an ST associates outputs with 
transitions, in fact, what we have defined is a Mealy ST, 
which differs from the alternative Moore ST in which an 
output is associated with a state [26| . 

Definition. A path is a series of edges visited sequen- 
tially when making state-to- state transitions with Tij > 0. 

Definition. A directed graph Q is connected if there is 
at least one path between every pair of states. 

Definition. A directed graph Q is strongly connected if 
for every pair of states, i and j , there is at least one path 
from i to j and at least one from j to i. 

The states in the graph of an ST can be classified as 
follows, refining the definitions given by Paz |26l . p. 85]. 

Definition. A state j is a consequent of state i if there 
is a path beginning at i and ending at j . 

Definition. A state is called transient if it has a con- 
sequent of which it is not itself a consequent. 

Definition. A state is called recurrent if it has at least 
one consequent of which it is itself a consequent. 

Note that transient and recurrent states can be over- 
lapping sets. We therefore make the following distinc- 
tions. 

Definition. A state is called asymptotically recurrent 
if it is recurrent, but not transient. 

Definition. A state is called transient recurrent if it is 
transient and recurrent. 

Generally speaking, an ST starts in a set of transient 
states and ultimately transits to one or another of the 
asymptotically recurrent subsets. That is, there can be 
more than one set of asymptotically recurrent states. Un- 
less stated otherwise, though, in the following we will 
consider STs that have only a single set of asymptoti- 
cally recurrent states. 

C. Word Probabilities 

Before discussing the process languages associated 
with an ST we must introduce the matrix notation re- 
quired for analysis. To facilitate comparing classical 
stochastic models and their quantum analogs, we adapt 
Dirac's bra-ket notation: Row vectors (-| are called bra 
vectors; and column vectors |-), ket vectors. 
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Notation. Let \rf) — (1, 1, . . . . 1, 1) T denote a column 
vector with \S\ components that are all Is. 

Notation. Let (tt\ — (7ro,7ri, . . . , tt| s| i ) be a row vector 

whose components, < ~Ki < 1, give the probability of 
being in state i. The vector is normalized in probability: 
zJi=o n i = The initial state distribution, with all of 
the probability concentrated in the start state, is denoted 

(7T°| = (1,0,...,0). 

For a series of L input symbols the action of the cor- 
responding ST is a product of transition matrices: 

T{y L \x L ) = iXitolzojTfoln) • • • T(y L ^i\x L _i) , 

whose elements {y L \x L ) give the probability of making 
a transition from state i to j and generating output y L 
when reading input x L . 

Starting in state distribution (ir°\, the state distribu- 
tion after reading in word x L and emitting word y L is 

(ir(y L \x L )\ = (n°\T(y L \x L ) . (4) 

This can then be used to compute the probability of read- 
ing out word y L conditioned on reading in word x L : 

Pv(y L \x L ) = (7r(y L \x L )\r,) . (5) 

IV. STOCHASTIC RECOGNIZERS AND 
GENERATORS 

We are ready now to specialize this general architec- 
ture into classes of recognizing and generating devices. In 
each case we address those aspects that justify our calling 
them models; viz., we can calculate various properties of 
the process languages that they represent directly from 
the machine states and transitions, such as the word dis- 
tribution and statistical properties that derive from it. 

Generally speaking, a recognizer reads in a word and 
has two possible outputs for each symbol being read in: 
accept or reject. This differs from the common model (29l | 
of reading in a word of finite length and only at the end 
deciding to accept or reject. This aspect of our model is 
a consequence of reading in process languages which are 
subword closed. 

In either the recognition or generation case, we will 
discuss only models for arbitrarily long, but finite-time 
observations. This circumvents several technical issues 
that arise with recognizing and generating infinite-length 
strings, which is the subject of w-language theory of 
Biichi automata [30| . 

Part of the burden of the following sections is to intro- 
duce a number of specializations of stochastic machines. 
Although it is rarely good practice to use terminology 
before it is defined, in the present setting it will be help- 
ful when tracking the various machine types to explain 
our naming and abbreviation conventions now. 

In the most general case — in particular, when the text 
says nothing else — we will discuss, as we have just done, 



machines. These are input-output devices or transduc- 
ers and we will denote this in any abbreviation with a 
capital T. These will be specialized to recognizers, abbre- 
viated R, and generators, denoted G. Within these basic 
machine types, there will be various alternative imple- 
mentations. We will discuss stochastic (S) and quantum 
(Q) versions. Within these classes we will also distinguish 
the additional property determinism, denoted D. 

As noted above, the entire development concerns ma- 
chines with a finite set of states. And so, we will almost 
always drop the adjectives "finite-state" and "unitary" , 
unless wc wish to emphasize these aspects in particular. 

A. Stochastic Recognizers 

Stochastic devices that recognize inputs have been var- 
iously defined since the first days of automata theory. Ra- 
bin's probabilistic automata [271 ]. for example, associate a 
stochastic matrix to each input symbol so that for a given 
state and input symbol the machine stochastically tran- 
sitions to a successor state. Accepting an input string 
x L with cut point A is defined operationally by repeat- 
edly reading in the same string and determining that the 
acceptance probability was above threshold: p(x L ) > A. 
Accepting or rejecting with isolated cut point A is defined 
for some 8 > with \p(x L ) — A| < 5, respectively. 

Here we introduce a stochastic recognizer that applies 
a variable form of cut-point recognition to process lan- 
guages with the net effect of representing the word dis- 
tribution within a uniform tolerance. 

One difference between the alternative forms of accep- 
tance is the normalization over equal-length strings for 
stochastic language recognition. Thus, Rabin's proba- 
bilistic automata do not recognize stochastic languages, 
but merely assign a number between and 1 to each 
word being read in. The same is true for Paz's stochastic 
sequential machines. 

Definition. A stochastic finite-state recognizer (SR) is 
a stochastic transducer with \Y\ — 1 and T(y\x) = T(x). 

One can think of the output symbol as accept. If no 
symbol is output the recognizer has halted and rejected 
the input. 

An SR's state-to-state transition matrix: 

T=Y J T{x), (6) 

is a stochastic matrix. 

Definition. An SR accepts a process language V with 
threshold 6, if and only if for allw^V 

|Pr( W ) - (n°\T(w)\ V ) | < 6 (7) 

and for all w^V, (ir°\T(w)\r]) = 0. 

The first criterion for accepting a process language is 
that all words in the language lead the machine through 



a series of transitions with positive probability and that 
words not in the language are assigned zero probability. 
That is, it accepts the support of the language. The 
second criterion is that the probability of accepting a 
word in the language is equal to the word's probability 
within a threshold 5. Thus, an SR not only tests for 
membership in a formal language, it also recognizes a 
function: the probability distribution of the language. 
For example, if S = the SR accepts exactly a process 
language's word distribution. If S > it accepts the prob- 
ability distribution with some fuzziness, still rejecting all 
of the language's probability-0 words. As mentioned be- 
fore, recognition happens at each time step. This means 
that in practice the experimenter runs an ensemble of 
SRs on the same input. The frequency of acceptance can 
then be compared to the probability of the input string 
computed from the T(x). 

Definition. The stationary state distribution (tt s \, 
which gives the asymptotic state visitation probabilities, 
is determined by the left eigenvector ofT(x): 

(n s \ = (n s \T(x) , (8) 

normalized in probability: Y]\—q 1 7r| = 1. 

For a series xqX\ ■ ■ -xl-\ of input symbols the action 
of the corresponding SR upon acceptance is a product of 
transition matrices: 

T{x L )=T{x )T{x 1 )---T{x L _ 1 ) , 

whose elements Tij(x L ) give the probability of making a 
transition from state i to j and generating output accept 
when reading input x L . If the SR starts in state distribu- 
tion (7r°|, the state distribution (7r(a; i )| after accepting 
word x L is 

{ir(x L )\ = {*°\T(x L ) . (9) 

In this case, the probability of accepting x L is 

Pr(^) = (T^T^Irj) . (10) 

We have the following special class of SRs. 

Definition. A stochastic deterministic finite-state rec- 
ognizer (SDR) is a stochastic finite-state recognizer 
whose substochastic transition matrices T(x) have at 
most one nonzero element per row. 

A word accepted by an SDR is associated with one 
and only one path. This allows us to give an efficient ex- 
pression for the word distribution of the language exactly 
(6 = 0) recognized by an SDR: 

Pv(x L ) = T S0Sl (x )T SlS2 (x 1 )---T SL _ 1SL (x L . 1 ) , (11) 

where s\S2 ■ ■ ■ sl is the unique series of states along the 
path selected by x L . 
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FIG. 2: Stochastic deterministic recognizer for the Golden 
Mean process language of Fig. [TJ The edges are labeled x\p, 
where x E X and p = Tij(x). The start state = (1,0,0) 
is double circled. The reject state and all transitions to it are 
omitted; as is the output accept on all edges. 

There is an important difference here with Eq. pop . 
Due to determinism, the computational cost for com- 
puting the word probability Py(x l ) from SDRs increases 
only linearly with L\ whereas it is exponential for SRs. 

Figure [2] shows an example of an SDR that recognizes 
the Golden Mean process language. That is, it rejects 
any word containing two consecutive 0s and accepts any 
other word with nonzero probability. This leads, in turn, 
to the self-similar structure of the support of the word 
probability distribution noted in Fig. [TJ 

A useful way to characterize this property is to list 
a process language's irreducible forbidden words — the 
shortest disallowed words. In the case of the Golden 
Mean formal language, this list has one member: T = 
{00}. Each irreducible word is associated with a family 
of longer words containing it. This family of forbidden 
words forms a Cantor set in the space of sequences, as 
described above. (Recall Fig. [TJ) 

If we take the threshold to be 6 — 0, then the SDR 
recognizes only the process language shown in Fig. [JJ If 
S = 1, in contrast, the SDR would accept process lan- 
guages with any distribution on the Golden Mean pro- 
cess words. That is, it always recognizes the language's 
support. 

One can easily calculate word probabilities and state 
distributions for the Golden Mean Process using the 
SDR's matrix representation. 

/ooi\ /o|o\ 

T(0) = 00 | and T(l) = | . (12) 
\0 0/ \0 1 0/ 

We use Eq. (fTD|) with the start state distribution (ir°\ = 
(1, 0, 0) to calculate the L = 1 word probabilities: 

Pr(0) = <7T |T(0)h7) = I , 
Pr(l) = {7r°|T(l)|ry) = | . 

(Eq. (jTTJ) would be equally applicable.) At L = 3 one 
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finds for x 3 = Oil: 
Pr(Oll) = (7r°|T(011)|ry) = (7r |T(0)T(l)T(l)|?7) 



As with SDRs, given the generator's state and an out- 
put symbol, the next state is uniquely determined. Sim- 
ilarly, it is less costly to compute word probabilities: 



In fact, all L — 3 words have the same probability, except 
for a; 3 = 101, which has a higher probability, Pr(101) = 
±, and x 3 G {000,001, 100}, for which Pr(x 3 ) = 0. (Cf. 
the L = 3 word distribution in Fig. Q]) 

The conditional probability of a 1 following a 0, say, is 
calculated in a similarly straightforward manner: 



Pr(l|0) 



Pr(01) (7r |T(0)T(l)|r/) 



Pr(0) 



(7r°|T(0)|rj) 



1 



Whereas, the probability Pr(0|0) of a following a is 
zero, as expected. 



B. Stochastic Generators 

As noted in the introduction, finite-state machines gen- 
erating strings of symbols can serve as useful models for 
structure in dynamical systems. They have been used 
as computational models of classical dynamical systems 
for some time; see Refs. [H, H3, El, H, EH , for 
example. 

As we also noted, automata that only generate outputs 
are less often encountered in formal language theory (29[ 
than automata operating as recognizers. One reason is 
that redefining a conventional recognizer to be a device 
that generates output words is incomplete. A mechanism 
for choosing which of multiple transitions to take when 
leaving a state needs to be specified. And this leads nat- 
urally to probabilistic transition mechanisms, as one way 
of completing a definition. We will develop finite-state 
generators by paralleling the development of SRs. 

Definition. A stochastic finite-state generator (SG) is 
a stochastic transducer with \X\ = 1. 

The input symbol can be considered a clock signal that 
drives the machine from state to state. The transition 
matrices can be simplified to T(y) — T(y\x). An SG's 
state-to-state transition probabilities are given by the 
stochastic state-to- state transition matrix: 



(13) 



Word probabilities are calculated as with SRs, save that 
one exchanges input symbols x with output symbols y: 



Pv(y L ) = (AT(y L )\v) ■ 
We define the following special class of SGs. 



(14) 



Definition. A stochastic deterministic finite-state gen- 
erator (SDG) is a stochastic finite-state generator in 
which each matrix T(y) has at most one nonzero entry 
per row. 



Pr(y L ) = T SoSl (y )T slS2 (y 1 )--.T SL _, SL (y L _ 1 ) . (15) 

Given an initial state distribution, a sum is taken over 
states, weighted by their probability. Even so, the com- 
putation increases only linearly with L. In the following 
we concentrate on SDGs. 

As an example, consider the generator for the Golden 
Mean process language. Its matrix representation is 
the same as for the Golden Mean recognizer given in 
Eqs. (|12[) . Its graphical representation is the same as 
in Fig. except that the edge labels x\p there should 
be given as p\y. (We return to the relationship between 
recognizers and equivalent generators shortly.) It turns 
out this is the smallest generator, but the proof of this 
will be presented elsewhere. 

One can easily calculate word probabilities and state 
distributions for the Golden Mean Process using the 
SDG's matrices. Let us consider a method, different from 
that used above for SRs, that computes probabilities us- 
ing the asymptotically recurrent states only. This is done 
using the stationary state distribution and the transi- 
tion matrices restricted to the asymptotically recurrent 
states. The method is useful whenever the start state 
is not known, but the asymptotic behavior of the ma- 
chine is. The transition matrices for the SDG, following 
Eqs. (p~2|) . become: 



T(0) 



2 




and T(l) = 



1 



(16) 



The stationary state distribution (ir s \ is the left eigen- 
vector of the state-to-state transition matrix T, Eq. (jT3j) : 

Assuming that the initial state is not known, but the 
process has been running for a long time, we use Eq. (11^ 
with (tt 3 \ to calculate the L — 1 word probabilities: 



Pr(0) = 
Pr(l) = 



(k s \t(o)\v) = 
(* s \T(i)\v) = 

3 _ 



At L = 3 one finds for y 3 = 011: 
Pr(011) = (7r-|T(011)|Tj) = (n s \T (0)T (1)T (1)\ V ) 



All L = 3 words have the same probability, except for 
y 3 = 101, which has a higher probability, Pr(101) = |, 
and y 3 G {000,001, 100}, for which Pr(y 3 ) = 0. (Cf. the 
L = 3 distribution in Fig. [I]) 

These are the same results found for the Golden Mean 
Process recognizer. There, however, we used a different 
initial distribution. The general reason why these two 
calculations lead to the same result is not obvious, but 
an explanation would take us too far afield. 

As a second example of an SDG consider the Even Pro- 
cess whose language consists of blocks of even numbers of 
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FIG. 3: A deterministic generator of the Even Process: Blocks 
of an even number of Is are separated by Os. Only the asymp- 
totically recurrent states are shown. Edges are labeled p | y, 
where y £ Y and p = Tij(y). The numbers in parentheses 
give a state's asymptotic probability. 

Is bounded by Os. The substochastic transition matrices 
for its recurrent states are 

T(0) = (J ° ) and T(l) =(?§)■ (17) 

The corresponding graph is shown in Fig. [31 Notice 
that the state-to-state transition matrix T is the same as 
the previous model of the Golden Mean Process. How- 
ever, the Even Process is substantially different; and 
its SDG representation lets us see how. The set of 
irreducible forbidden words is countably infinite [2^ |: 
T = {01 2k+1 : k = 0,1,2,...}. Recall that the 
Golden Mean Process had only a single irreducible for- 
bidden word {00}. One consequence is that the words in 
the Even Process have a kind of infinite correlation: the 
"evenness" of the length of 1-blocks is respected over ar- 
bitrarily long words. This makes the Even Process effec- 
tively non-finite: As long as a sequence of Is is produced, 
memory of the initial state distribution persists. Another 
difference is that the support of the word distribution has 
a countable infinity of distinct Cantor sets — one for each 
irreducible forbidden word. Thus, the Even Process falls 
into the broader class of finitary processes. 

C. Properties 

We can now describe the similarities and differences 
between stochastic and other kinds of recognizers and 
between the various classes of generators. Let S(A4) de- 
note the stochastic language recognized or generated by 
automaton M.. Let P(C) denote the set of stochastic 
languages generated or recognized by machines in class 
C. 

The relationships between the languages associated 
with the various machine types follow rather directly 
from their definitions. We swap input and output alpha- 
bets and reinterpret the same transition matrices, either 
as specifying x\p or p\y as required. All, that is, except 
for the last two results, which may be unexpected. 

Proposition 1. For every SR, supp S(SR) is a regular 
language. 

Proof. The graph of an SR, removing the probabilities, 
defines a finite-state recognizer and accepts, by definition, 



a regular language |H/. This regular language is the sup- 
port of S(SR) by construction. 

Proposition 2. For every SR, S(SR) is a process lan- 
guage. 

Proof. The first property to establish is that the set of 
words recognized by an SR is subword closed: if Pr(x L ) > 
0, then all w 6 sub(x L ) have Pi(w) > 0. This is guar- 
anteed by definition, since the first input symbol not en- 
countering an allowed transition leads to rejection of the 
whole input, see the SR definition. 

The second property to establish is that the word dis- 
tribution Pr(x L ) is normalized at each L. This follows 
from T in Eq. [S| being stochastic. 

Proposition 3. SGs and SRs generate and recognize, 
respectively, the same set of languages: P(SG) — P(SR). 

Proof. Consider SG's transition matrices T(y) and form 
a new set T{x) in which X = Y. The T{x) define an SR 
that recognizes S{SG). 

It follows that P(SG) C P(SR). 

Now consider SR's transition matrices T(x) and form 
a new set T(y) in which Y = X . The T{y) define an SG 
that generates S(SR). 

It follows that P{SG) = P{SR). 

Corollary 1. For every SG, supp S(SG) is a regular 
language. 

Corollary 2. For every SG, S(SG) is a process lan- 
guage. 

Corollary 3. SDGs and SDRs generate and recognize, 
respectively, the same set of languages: P(SDG) — 
P(SDR). 

These equivalences are intuitive and expected. They 
do not, however, hint at the following, which turn on the 
interplay between nondeterminism and stochasticity. 

Proposition 4. There exists an SG such that V(SG) is 
not recognized by any SDR. 

Proof. We establish this by example. Consider the non- 
deterministic generator in Fig. the Simple Nondeter- 
ministic Source (SNS). To show that there is no possible 
construction of an SDR we argue as follows. If a ap- 
pears, then the generator is in state A. Imagine this is 
then followed by a block l k . At each k the generator is 
in either state A or B. The probability of seeing a next 
is ambiguous (either or 1/2) and depends on the exact 
history of internal states visited. Deterministic recogni- 
tion requires that a recognizer be in a state in which the 
probability of the next symbol is uniquely given. While 
reading in Is the recognizer would need a new state for 
each 1 connecting to the same state (state A) on a 0. 
Since this is true for all k, there is no finite-state SDR 
that recognizes the SNS's process language. 
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FIG. 4: A nondeterministic generator that produces a pro- 
cess language not recognized by any (finite-state) SDR. Only 
asymptotically recurrent states are shown. Edges are labeled 
p | y, where y € {0, 1} and p = Ty(y). 

Ref. [llj gives an SDR for this process that is minimal, 
but has a countably infinite number of states. Note that 
supp V(SNS) is the support of the Golden Mean process 
language. 

Corollary 4. There exists an SR such that V(SR) is 
not generated by any SDG. 

These propositions say, in essence, that deterministic 
machines generate or recognize only a subset of the uni- 
tary process languages. In particular, Props. [3J HI and 
Cor. [3] imply proper containment: P(SDR) C P(SG) 
and P(SDG) C P(SR). This is in sharp contrast with 
the standard result in formal language theory: determin- 
istic and nondeterministic automata reco gniz e the same 
class of languages — the regular languages [29| . 

This ends our development of classical machines and 
their specializations. We move on to their quantum 
analogs, following a strategy that is familiar by now. 

V. FINITARY QUANTUM PROCESSES 

As with stochastic processes, the evolution of a quan- 
tum system is monitored by a series of measurement 
outcomes — numbers registered in some way. Each out- 
come can be taken to be the realization of a random 
variable. The distribution over sequences of these ran- 
dom variables is what we call a quantum process. We 
will consider the finitary version of quantum processes in 
the same sense as used for the classical stochastic pro- 
cesses: The internal resources used during the evolution 
are finitely specified. 

A. Quantum States 

Quantum mechanics is sometimes viewed as a general- 
ization of classical probability theory with noncommut- 
ing probabilities. It is helpful, therefore, to compare clas- 
sical stochastic automata and quantum automata and, in 
particular, to contrast the corresponding notions of state. 
The goal is to appreciate what is novel in quantum au- 
tomata. The reader should have a working knowledge of 
quantum mechanics at the level of, say, Ref. [35[. 

In the classical (stochastic) setting an automaton has 
internal states S and also a distribution (tt| over them. 



The distribution itself can be taken to be a "state" , 
but of what? One interpretation comes from consider- 
ing how an observer monitors a series of outputs from 
a stochastic generator and predicts, with each observed 
symbol, the internal state s € S the automaton is in. 
This prediction is a distribution (ir\ over the internal 
states — one that represents the observer's best guess of 
the automaton's current internal state. In this sense 
the distribution is the state of the best predictor. If 
(tt| = (0, . . . , 0, TTi = 1, 0, . . . , 0), then the observer knows 
exactly what internal state, Si € S, the automaton is in. 
For these special cases one can identify state distributions 
and internal states. 

Similarly, there are several kinds of state that one 
might define for a quantum automaton. Each quantum 
automaton will consist of internal states and we will take 
the state of the automaton to be a superposition over 
them. The central difference with classical (stochastic) 
automata is that the superposition over internal states 
is not a probability distribution. In particular, inter- 
nal states have complex amplitudes and, therefore, they 
potentially interfere. This, in turn, affects the process 
language associated with the quantum automaton. 

In contrast with quantum automata, the state of a 
quantum dynamical system depends on the choice of a 
basis that spans its state space. The state is completely 
specified by the system's state vector, a unit vector repre- 
sented as a sum of basis states that span the state space. 
However, if one chooses a basis consisting of the eigen- 
states of an observable and associates them with internal 
states of quantum automaton, there is a simple corre- 
spondence between a state vector of a quantum dynam- 
ical system (a superposition of basis states) and a state 
of a quantum automaton (a superposition over internal 
states). Thus, we will use the terms internal states (of 
an automaton) and basis states (of a quantum dynam- 
ical system's state space) interchangeably. By similar 
reasoning, the state vector (of a quantum dynamical sys- 
tem) and state (of a quantum automaton) will be used 
interchangeably. 

In the vocabulary of quantum mechanics, at any mo- 
ment in time a given quantum automaton is in a pure 
state — another label for a superposition over internal 
states. An observer's best guess as to the automaton's 
current pure state is a probability distribution over state 
vectors — the so-called mixed state. 

It is helpful to imagine a collection of individual quan- 
tum automata, each in a (pure) state, that is specified by 
a distribution of weights. One can also imagine a single 
quantum automaton being in different pure states at dif- 
ferent moments in time. The time-averaged state then is 
also a mixed state. It is the latter picture that we adopt 
here. 

The fact that a quantum pure state can be a super- 
position of basis states is regarded as the extra structure 
of quantum mechanics that classical mechanics does not 
have. We respect this distinction by building a hierarchy 
of quantum states that goes from basis states to superpo- 
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sitions of basis states to mixtures of superpositions. The 
analogous classical-machine hierarchy goes only from in- 
ternal states to distributions over internal states. 



B. Quantum Measurement 

We now turn to the measurement process, a crucial and 
also distinctive component in the evolution of a quantum 
dynamical system, and draw parallels with quantum au- 
tomata. In setting up an experiment, one makes choices 
of how and when to measure the state of a quantum sys- 
tem. These choices typically affect what one observes, 
and in ways that differ radically from classical dynami- 
cal systems. 

Measurement is the experimental means of character- 
izing a system in the sense that the observed symbols 
determine the process language and any subsequent pre- 
diction of the system's behavior. The measurement of a 
quantum mechanical system is described by a Hermitian 
operator that projects the current state onto one (or sev- 
eral) of the operator's eigenstates. After a measurement, 
the system is, with certainty, in the associated (subset of) 
eigenstate(s). Such an operator is also called an observ- 
able and the eigenvalues corresponding to the eigenstates 
are the observed measurement outcomes. 

To model this situation with a quantum automaton, 
we identify the states of the automaton with the eigen- 
states of a particular observable. A measurement is de- 
fined through an operator that projects the automaton's 
current state vector onto one (or a subset) of its internal 
(basis) states. The "observed" measurement outcome is 
emitted as a symbol labeling the transition(s) which enter 
that internal state (or that subset of states). 



VI. QUANTUM TRANSDUCERS 

The study of quantum finite-state automata has pro- 
duced a veritable zoo of alternative models for language 
recognition. (These are reviewed below in Section rVlI B\ ) 
Since we are interested in recognition, generation, and 
transduction of process languages, we start out defining 
a generalized quantum-finite state transducer and then 
specialize. We develop a series of quantum finite-state au- 
tomaton models that are useful for recognition and gener- 
ation and, ultimately, for modeling intrinsic computation 
in finitary quantum processes. It is worth recalling that 
these quantum finite-state machines form the lowest level 
of a hierarchy of quantum computational models. Thus, 
they are less powerful than quantum Turing machines. 
Nevertheless, as we will see, they exhibit a diversity of 
interesting behaviors. And, in any case, they represent 
currently feasible quantum computers. 



A. Definition 

We define a quantum transducer that corresponds to 
the standard quantum mechanical description of a phys- 
ical experiment. 

Definition. A quantum transducer (QT) is a tuple 
{Q, (ip\ G H, A, Y, T(Y\X)} where 

1. Q — {qi : i — 0, . . . , n — 1} is a set of n internal 
states. 

2. The state vector (ip\ lies in an n- dimensional 
Hilbert space TC; its initial value is the start state 

(V>°l- 

3. X and Y are finite alphabets for input and output 
symbols, respectively. 

4-. T(Y\X) is a set of n x n transition matrices 
{T(y\x) = U(x)P(y),x G A, y S Y} that are prod- 
ucts of 

(a) a unitary matrix U(x): W(x) = U~ x (x) (\ 
denotes complex transpose); and 

(b) a projection operator P(y). 

At each time step a quantum transducer (QT) reads a 
symbol x G A from the input, outputs a symbol y G Y , 
and updates its state vector (tp\ via T(y\x). 

The preceding discussion of state leads to the following 
correspondence between a QT's internal states and state 
vectors. 

Definition. One associates an internal state qi G Q with 
the eigenstate (fa\ of an observable such that: 

1. For each qi G Q there is a basis vector (fa\ = 
(0, . . . , 1, . . . , 0) with a 1 in the i th component. 

2. The set {{4>i\ : i = 0, 1, . . . , n — 1} spans the Hilbert 
space H. 

Definition. A state vector (tp\ € 7i is a unit vector. It 
can be expanded in terms of basis states (4>i | : 

n-l 

M= (18) 

i=0 

with Ci G C and Y^i=o c i Ci = 

Identifying internal states qi and basis states (fa | con- 
nects the machine view of a quantum dynamical system 
with that familiar from standard developments of quan- 
tum mechanics. A QT state is given by its current state 
vector (ip\. At each time step a symbol x is read in, which 
selects a unitary operator U(x). The operator is applied 
to the state vector and the result is measured via P(y). 
The output, an eigenvalue of the observable, is symbol y. 

We describe a QT's operation via the evolution of a 
bra (row) vector. We make this notational choice, which 
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is unconventional in quantum mechanics, for two rea- 
sons that facilitate comparing classical and quantum au- 
tomata. First, the state distribution of a classical finite- 
state machine is given conventionally by a row vector. 
And second, the graphical meaning of a transition from 
state i to j is reflected in the transition matrix entries 
Tij , only if one uses row vectors and left multiplication 
with T. This is also convention for stochastic processes. 



QTs model a general experiment on a quantum dy- 
namical system. As such they should be contrasted with 
the sequential machines and transducers of Refs. [13] and 
(38| . respectively, that map the current quantum state 
onto an output. This mapping, however, is not associ- 
ated with a measurement interaction and lacks physical 
interpretation. 



Measurement 



Evolution and Word Distributions 



The projection operators are familiar from quantum 
mechanics and can be defined in terms of the internal 
(basis) states as follows. 

Definition. A projection operator P{y) is the linear op- 
erator 



P(y) = \<P 



1 1 XT1 I 3 



(19) 



where <pi is the eigenvector of the observable with eigen- 
value y. In the case of degeneracy P{y) sums over a 
complete set {i} of mutually orthogonal eigenstates: 



(20) 



{'} 



Each P is Hermitian (P* 
P)- 



P) and idempotent (P 2 



P = {P(y) ■ y G Y U {A}} is the set of projection 
operators with J2 y eY P{v) = 1> where 1 is the identity 
matrix. A is the null symbol and a placeholder for "no 
measurement" . We take P{\) — 1 and do not include it 
in the calculation of word probabilities, for example. "No 
measurement" differs from a non-selective measurement 
where a projection takes place, but the outcome is not de- 
tected. The decision whether to perform a measurement 
or not is considered an input to the QT. 

In the eigenbasis of a particular observable the cor- 
responding matrices only have and 1 entries. In the 
following we assume such a basis. In addition, we con- 
sider only projective measurements which apply to closed 
quantum systems. (Open systems will be considered else- 
where.) 

In quantum mechanics, one distinguishes between 
degenerate and non- degenerate measurement operators 
[361 ] . A non- degenerate measurement operator projects 
onto one-dimensional subspaces of Ji. That is, the eigen- 
vectors of the operator all have distinct eigenvalues. In 
contrast, the operators associated with a degenerate mea- 
surement have degenerate eigenvalues. Such an operator 
projects onto higher-dimensional subspaces of 7i. After 
such a measurement the QT is potentially in a super- 
position of states Yli °i (4>i I > where i sums over the de- 
generate set of mutually orthogonal eigenstates. Just as 
degeneracy leads to interesting consequences in quantum 
physics, we will see in the examples to follow that degen- 
erate eigenvalues lead to interesting quantum languages. 



We can now describe a QT's operation as it scans its 
input. Starting in state (?/> | it reads in a symbol x G X 
from an input word and updates its state by applying the 
unitary matrix U(x). Then the state vector is projected 
with P(y) and renormalized. Finally, symbol y £ Y is 
emitted. That is, the state vector after a single time-step 
of a QT is given by: 



(ip(y\x)\ = 



W \T(y\x) 



(iP°\U(x)P(y) 



(21) 



In the following we drop the renormalization factor in the 
denominator to enhance readability. It will be mentioned 
explicitly when a state is not to be normalized. 

When a QT reads in a length-L word x L £ X L and 



outputs a length- L word y L e Y L , 
becomes 



the transition matrix 



(22) 



T(y^\x L ) = U(x a )P(y ) ■ ■ ■ U(x L ^)P(y L ^) 

and the updated state vector is 

^{ V L \x L )\ = {^\T{y L \x L ) . (23) 

Starting the QT in the conditional probability 
Pr(y|a;) of the output symbol y given the input symbol 
x is calculated from the state vector in Eq. (f2"Tj) , before 
renormalization: 



Pr(y\x) = (i/j(y\x)\ip(y\x)) 



(24) 



The probability Pr(y L \x L ) of output sequence y L condi- 
tioned on input sequence x L is calculated similarly using 
Eq. (HSJ): 



Pr(y L \x L ) = (^(y L \x L M(y L \x L )) 



D. Properties 



(25) 



We draw out several properties of QTs on our way to 
understanding their behavior and limitations. 

Proposition 5. A QT's output alphabet size is bounded: 
\Y\ < dim(ft). 
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Proof. This follows from the QT definition since out- 
put symbols are directly associated with eigenvalues. The 
number of eigenvalues is bounded by the dimension of the 
Hilbert space. 

Many properties of QTs are related to a subclass of 
STs, those with doubly stochastic transition matrices. 
Given this, it is useful to recall the relationship between 
unitary and doubly stochastic matrices. 

Definition. Given a unitary matrix U , matrix M with 
Mij = \Uij\ 2 is called a unistochastic matrix. 

A unistochastic matrix is doubly stochastic, which 
follows from the properties of unitary matrices. Com- 
pared to a stochastic transducer, a QT's structure is 
constrained through unitarity and this is reflected in its 
architecture. A path exists between node i and node j 
when > 0. An equivalent description of a quantum 
transducer is given by its graphical representation. 

Recalling the types of graph state defined in Sec. 1111 Bl 
we find that only a subset occur in QTs. Specifically, a 
QT has no transient states. 

Proposition 6. Every node i ofQ(QT), if connected to 
a set of nodes j ^ i, is a member of a strongly connected 
set. 

Proof. Given that one path exists from (say) i to j , we 
must show that the reverse one exists, going from j to 
i. According to the definition of a path it is sufficient 
to show this for the unistochastic matrix Mij — \Uij\ 2 . 
A doubly stochastic matrix can always be expressed as a 
linear combination of permutation matrices. Thus, any 
vector (0, 0, . . . , 1, . . . ) with only one 1 entry can be per- 
muted into any other vector with only one 1 entry. This 
is equivalent to saying that, if there is a path from node 
i to j there is a path from j to i. 

The graph properties of a unitary matrix mentioned 
here should be compared with those discussed by Sev- 
erini [39j | and others. The graph of a finite-state machine 
specified by a unitary matrix is a directed graph, or di- 
graph. A digraph vertex is a source (sink) if it has no 
ingoing (no outgoing) arcs. A di grap h vertex is isolated 
if it is not joined to another. Ref. [39( characterizes these 
machines by assuming their digraphs have no isolated 
nodes, no sinks, and no sources. Given the preceding 
proposition the nonexistence of sinks or sources follows 
simply from assuming no isolated nodes. 

One concludes that QT graphs are a limited subset of 
digraphs, namely the strongly connected ones. Further- 
more, there is a constraint on incoming edges to a node. 

Proposition 7. All incoming transitions to an internal 
state are labeled with the same output symbol. 

Proof. Incoming transitions to internal state qi are la- 
beled with output symbol y if (4>i\ has eigenvalue y. Every 
eigenstate has a unique eigenvalue, and so the incoming 
transitions to any particular state qi are labeled with the 
same output symbol representing one eigenvalue. 



Proposition 8. A QT's transition matrices T(y\x) 
uniquely determine the unitary matrices U (x) and the 
projection operators P(y). 

Proof. Summing the T(y\x) over all y for each x yields 
the unitary matrices U(x): 

]T T(y\x) = ]T U(x)P(y) = U(x) . (26) 

The P(y) are obtained, from any of the U(x), through 
the inverse ofU^ 1 (x) — U^(x): 

P(y) = U\x)T(y\x) . (27) 

Definition. A QT is reversible if the automaton defined 
by the transpose of each U(x) and P(y) is also a QT. 

Proposition 9. QTs are reversible. 

Proof. The transpose of a unitary matrix is unitary. 
The transpose of a projection operator is the operator it- 
self. 

Graphically, the reversed QT is obtained by simply 
switching the direction of the edges. This produces a 
transducer with the transition amplitudes Tji, formerly 
Tij. The original input and output symbols, which 
labeled ingoing edges to state qi, remain unchanged. 
Therefore, in general, the languages generated by a QT 
and its reverse are not the same. By way of contrast, 
this simple operation applied to an ST does not, in gen- 
eral, yield another ST. A simple way to summarize these 
properties is that a QT forms a group, an ST forms a 
semi-group. 

VII. QUANTUM RECOGNIZERS AND 
GENERATORS 

The quantum transducer is our most general construct, 
describing a quantum dynamical process in terms of in- 
puts and outputs. We will now specialize quantum trans- 
ducers into recognizers and generators. We do this by 
paralleling the strategy adopted for developing classes of 
stochastic transducers. For each machine class we first 
give a general definition and then specialize, for example, 
yielding deterministic variants. We establish a number of 
properties for each type and then compare their descrip- 
tive powers in terms of the process languages each class 
can recognize or generate. The results are collected to- 
gether in a computational hierarchy of finitary stochastic 
and quantum processes. 

A. Quantum Recognizers 

Quantum finite-state machines are almost exclusively 
discussed as recognizing devices. Following our develop- 
ment of a consistent set of quantum finite-state trans- 
ducers, we can now introduce quantum finite-state rec- 
ognizers as restrictions of QTs and compare these with 
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alternative models of quantum recognizers. Since we are 
interested in the recognition of process languages our def- 
inition of quantum recognizers differs from those intro- 
duced elsewhere; see Sec. IVIIBI below. The main differ- 
ence is the recognition of a process language including 
its word distribution. The restrictions that will be im- 
posed on a QT to achieve this are similar to those of the 
stochastic recognizer. 

Definition. A quantum finite-state recognizer ( QR) is a 
quantum transducer with \Y\ = 1 and T(y\x) = UP(x) = 
T(x). 

One can think of the output symbol y as accept. The 
condition for accepting a symbol is, then, 

Pr(x) = (ip°\T(x)T\x)\ip°) . (28) 

If no symbol is output the recognizer has halted and 
rejected the input. Operationally, recognition works as 
it does in the classical setting. An experimenter runs an 
ensemble of QRs on the same input. The frequency of 
acceptance can then be compared to the probability of 
the input string computed using the T(x). 

Definition. A QR accepts a process language V with 
word-probability threshold 5, if and only if for all w G V 

\Pi(w)- {ip \T{w)T^(w)\ip }\ <S (29) 

and for all w<£V, (ip \T(w)T^ H|V>°) = 0. 

Acceptance or rejection happens at each time step. 
We also have deterministic versions of QRs. 

Definition. A quantum deterministic finite-state recog- 
nizer ( QDR ) is a quantum recognizer with transition ma- 
trices T(x) that have at most one nonzero element per 
row. 

B. Alternatives 

Quantum finite automata were introduced by several 
authors in different ways, and they recognize different 
classes of languages. To our knowledge the first men- 
tion of quantum automata was made by Albert in 1983 
|40| . Albert's results have been subsequently criticized 
by Peres as being based on an inadequate notion of mea- 
surement fill ]. 

Kondacs and Watrous introduced 1-way and 2-way 
quantum finite-state automata [42j ■ The 1-way automata 
read symbols once and from left to right (say) in the input 
word. Their 2-way automata scan the input word many 
times moving either left to right or right to left. The au- 
tomata allow for measurements at every time step, check- 
ing for acceptance, rejection, or continuation. They show 
that a 2-way QFA can recognize all regular languages and 
some nonregular languages. 1-way QFA are less power- 
ful: They can only recognize a subset of the regular lan- 
guages. A more powerful generalization of a 1-way QFA 



is a 1-way QFA that allows mixed states, introduced by 
Aharonov et al 43] . They also allow for nonunitary evo- 
lution. Introducing the concept of mixed states simply 
adds classical probabilities to quantum probabilities and 
is inherent in our model of QTs. 

The distinctions between these results and the QRs in- 
troduced here largely follow from the difference between 
regular lang uag es and process languages. Thus, the re- 
sult in Ref. [421 ] that no 1-way quantum automaton can 
recognize the language (0 + 1)*0, does not apply to QTs. 
It clearly is a regular language, but not a process lan- 
guage. Also, the result by Bertoni and Carpentieri that 
quantum automata can recognize nonregular languages, 
does not apply here (44[. They find that a quantum au- 
tomaton that is measured only after the whole input has 
been read in can recognize a nonregular language. A QR, 
however, applies measurement operators for every sym- 
bol that is being read in. 

Moore and one of the authors introduced 1-way quan- 
tum automata (without using the term "1-way") [45|]. It 
is less powerful than the 1-way automata of Kondacs and 
Watrous, since it allows only for a single measurement af- 
ter the input has been read in. They also introduced a 
generalized quantum finite-state automaton whose tran- 
sition matrices need not be unitary, in which case all 
regular languages are recognized. A type of quantum 
transducer mentioned earlier, a qua ntum sequential ma- 
chine was introduced by Gudder [33]. The link, however, 
between machine output and quantum physical measure- 
ment is missing. Freivalds and Winter introduced quan- 
tum transducers (38j that at each step perform a mea- 
surement to determine acceptance, rejection, or contin- 
uation of the computation. In addition, they map the 
current quantum state onto an output. Here too, the 
mapping is not associated with a measurement interac- 
tion and lacks physical interpretation. 

These alternative models for quantum automata ap- 
pear to be the most widely discussed. There are others, 
however, and so the above list is by no means complete. 
Our motivation to add yet another model of quantum 
finite-state transducer and recognizer to this list is the 
inability of the alternatives to recognize or process lan- 
guages that represent quantum dynamical systems sub- 
ject to repeated measurement. 



C. Quantum Generators 

We now introduce quantum finite-state generators as 
restrictions of QTs and as a complement to recogniz- 
ers. They serve as a representation for the behavior of 
autonomous quantum dynamical systems. In contrast 
to quantum finite-state recognizers, quantum finite-state 
generators appear to not have been discussed before. A 
quantum generator is a QT with only one input. As in 
the classical case, one can think of the input as a clock 
signal that drives the machine through its transitions. 
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Definition. A quantum finite-state generator (QG) is a 
quantum transducer with \X\ = 1. 

At each step it makes a transition from one state to 
another and emits a symbol. As in the classical case 
there are nondeterministic (just implicitly defined) and 
deterministic QGs. 

Definition. A quantum deterministic finite-state gener- 
ator ( QDG) is a quantum generator in which each matrix 
T(y) has at most one nonzero entry per row. 

Interestingly, there is a mapping from a given QDG to 
a classical automaton. 

Definition. Given a QDG M = {U,P(y)}, the equiv- 
alent (classical) SDG M! = {T(y)} has unistochas- 
tic state-to- state transition matrix T with components 
Tij = [U l3 ] 2 . 

We leave the technical interpretation of "equivalent" 
to Thm.[2]below. 

As mentioned earlier, in quantum mechanics one dis- 
tinguishes between degenerate and non-degenerate mea- 
surements. Having introduced the different types of 
quantum generators, we can now make a connection to 
degenerate measurements. 

Definition. A quantum complete finite-state genera- 
tor (QCG) is a quantum generator observed via non- 
degenerate measurements. 

In order to average over observations, we must extend 
the formalism of quantum automata to describe distri- 
butions over state vectors. Recalling the notions of state 
discussed in Section IV A[ this means we need to describe 
mixed states and their evolution. 

Let a system be described by a state vector (ipi | at time 
t. If we do not know the exact form of (ipi\ but only a 
set of possible (tpi\ , i = 0, k— 1, then we give the best 
guess as to the system's state in terms of a statistical mix- 
ture of the (tpi | . This statistical mixture is represented 
by a density operator p with weights pi assigned to the 

k-l 

P =j2pi\^)(^\ ■ ( 3 °) 

2 = 

The main difference from common usage of "mixed state" 
is that we compare the same state over time; whereas, 
typically different systems are compared at a single time. 
Nevertheless, in both cases, the density matrix formalism 
applies. 



D. Properties 

With this notation in hand, we can now establish a 
number of properties of quantum machines. 



Definition. A QG's stationary state p s is the mixed 
state that is invariant under unitary evolution and mea- 
surement: 

P s = E P(v)U f p s UP(y) . (31) 

p s is the mixed state which the quantum machine is in 
on average, since we are describing a single system that is 
always in a pure state. The stationary state is therefore 
the best guess of an observer ignorant of the machine's 
state. 

Theorem 1. A QG's stationary state is the maximally 
mixed state: 

n-l 

^n-^l^l^hl/n. (32) 

i=0 

Proof. Since the {<f>i\ are basis states, p s is a diagonal 
matrix equal to the identity multiplied by a factor. Re- 
call that the stationary distribution of a Markov chain 
with doubly stochastic transition matrix is always uni- 
form fldlj . And so, we have to establish that p s is an 
invariant distribution: 

P S = J2 P(y)U^p s UP(y) (33) 

y&r 

= n- 1 £ P(y)U^UP(y) (34) 

y<EY 

= n- 1 £ P(y) = t/n . (35) 

Now we can calculate the asymptotic symbol probabil- 
ities, using the density matrix formalism for computing 
probabilities of measurement outcomes (47| , and p s . 

Proposition 10. A QG's symbol distribution depends 
only on the dimensions of the projection operators and 
the Hilbert space. 

Proof. Denote the trace operator by tr, then we have 
Pr(y) = tr(THy)p s T(y)) 

= n-hr (T\y)lT(y)) 
= n~hr {P\y)U^UP{y)) 

= n- l tr (Pt(y)ip(y)) 

= n-hr(P(y)) 

= ri _1 dim P(y) . (36) 

Although the single-symbol distribution is determined 
by the dimension of the subspaces onto which the P(y) 
project, distributions of words y L with L > 1 are not 
similarly restricted. The asymptotic word probabilities 
Pr(y L ) are: 

Pr(y L )=t r (THy L )p s T(y L )) . (37) 

No further simplification is possible for the general case. 

Analogous results follow for QRs, except that the cal- 
culations are suitably modified to use T{x). 
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E. Finitary Process Hierarchy 

To better appreciate what these machines are capa- 
ble of we amortize the effort in developing the preced- 
ing results to describe the similarities and differences be- 
tween quantum recognizers and generators, as well as 
between stochastic and quantum automata. We collect 
the results, give a summary and some interpretation, and 
present a road map (Fig. [5]) that lays out the computa- 
tional hierarchy of finitary quantum processes. As above, 
S(M) denotes the stochastic language associated with 
machine or machine type M and P(C), the set of stochas- 
tic languages generated or recognized by all machines in 
class C. 

Proposition 11. QCGs are deterministic. 

Proof. Since all projection operators have dimension 
one, all transition matrices have at most one nonzero 
element per row. This is the condition for being a QDG. 

Non-degenerate measurements always define a QDG. 
There are degenerate measurements, however, that also 
can lead to QDGs, as we will show shortly. One concludes 
that P(QCG) c P(QDG). 

We now show that for any QDG there is an SDG gener- 
ating the same stochastic language. Thereby we establish 
observational equivalence between the different classes of 
machine. 

Theorem 2. Every S(QDG) is generated by some SDG: 
P(QDG) C P(SDG). 

Proof. We show that the SDG generating S{QDG) is the 
equivalent SDG, as defined in Sec. I VII Cl and that the 
QDG M. and its equivalent SDG M! generate the same 
word distribution and so the same stochastic language. 

The word probabilities Pr jii(y L ) for M. are calculated 
using Eq. |37| ) and the QDG's transition matrices Tm' ■ 

Vr M {y L )=tr(Tl l {y L )p s T M { y L )) 
= n-hv(T^T) 
= n~ l Y}T^T] ll 

« 3 
ij 

The word probabilities PrM'(y L ) for M' are calculated 
using Eq. {1$ and the SDG's transition matrices Tm- 

Pr M '(y L ) = (AT M '(y L )\v) 

t=0 \ 3 J 
n-1 

= n- 1 J2(T M '(y L )h- (38) 

i,3=0 



Since (TVl(y L ))^ = (TM'(y L ))ij , from the definition of 
an equivalent SDG, the claim follows. 

More than one QDG can be observationally equivalent 
to a given SDG. The reason for this to occur is that the 
quantum mechanical phases of the transition amplitudes 
cancel in the transformation from a QDG. 

We can now easily characterize languages produced by 
QDGs. 

Corollary 5. For every QDG, supp S(QDG) is a reg- 
ular language. 

Proof. This follows directly from Thm. [H and Cor. QJ 

Corollary 6. For every QDG, S(QDG) is a process lan- 
guage. 

Proof. This follows directly from Thm. [H and Cor. [H 

With this we can begin to compare the descriptive 
power of the different machine types. 

Proposition 12. QGs and QRs are equivalent: They 
recognize and generate the same set of stochastic lan- 
guages, respectively: P(QG) —P(QR). 

Proof. Consider QG's transition matrices T(y) — 
UP(y) and form a new set T(x) — UP(x) in which 
P{x) = P{y), associating the QR's input X with the 
QG's output Y . The T(x) define a QR that recognizes 
S{QG). It follows that P(QG) C P(QR). 

Now consider QR 's transition matrices T(x) — UP{x) 
and form a new set T(y) in which P{y) = P(x), asso- 
ciating inputs and outputs as above. The T(y) define a 
QG that generates S(QR). 

It follows that P(QG) = P(QR). 

Corollary 7. QDGs and QDRs are equivalent: They 
recognize and generate the same set of stochastic lan- 
guages, respectively: P{QDG) = P(QDR). 

Proof. Prop. \12\ 's proof goes through if one restricts to 
deterministic machines. 

Corollary 8. For every QDR, supp S{QDR) is a reg- 
ular language. 

Proof. This follows directly from Cor. [7] and Cor. [?| 

Corollary 9. For every QDR, S{QDR) is a process lan- 
guage. 

Proof. This follows directly from Cor. [5| and Cor. [7| 

Proposition 13. There exists an SDG such that 
V(SDG) is not generated by any QDG. 

Proof. The process language generated by the SDG given 

by T(0) = (-^\ and T(l) = (l - (a biased 

coin) cannot be generated by any QDG. According to 
Prop. \10\ Pr(v) = n^ 1 dimP(y), which is a rational num- 
ber, whereas Pr(y) for the above biased coin is irrational. 
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FIG. 5: Finitary process language hierarchy: Each circle rep- 
resents the set of process languages recognized or generated 
by the inscribed machine class. Increasing height indicates 
proper containment; machine classes at the same height are 
not directly comparable. The hierarchy summarizes the the- 
orems, propositions, and corollaries in Sees. IIV"Cl and [VII El 



Corollary 10. P{QDG) C P(SDG). 

Proof. From Thm. [| and Prop. [7^1 

Corollary 11. V(QDR) C P(SDR): 

Proof. From Cor. [1 Cor. Thm. @ and Prop. [TJ 

At this point it is instructive to graphically summarize 
the relations between recognizer and generator classes. 
Figure [5] shows a machine hierarchy in terms of sets of 
languages recognized or generated. The class of QCGs is 
at the lowest level. This is contained in the class of QDGs 
and QDRs. The languages they generate or recognize 
are properly included in the set of languages generated 
or recognized by classical deterministic machines — SDGs 
and SDRs. These, in turn, are included in the set of 
languages recognized or generated by classical nondeter- 
ministic machines, SGs and SRs, as well as QRs and 
QGs. 

The preceding results serve to indicate how portions 
of the finitary process hierarchy are organized. However, 
there is still more to understand. For example, the reg- 
ularity of the support of finitary process languages, the 
hierarchy's dependence on acceptance threshold 5, and 
the comparability of stochastic and quantum nondeter- 
ministic machines await further investigation. 



VIII. QUANTUM GENERATORS AND 
FINITARY PROCESSES: EXAMPLES 

To appreciate what can be done with quantum ma- 
chines, we will illustrate various features of QTs by mod- 
eling several prototype quantum dynamical systems. We 
start out with deterministic QGs, building one to model 
a physical system, and end on an example that illustrates 
a nondeterministic QT. 



A. Two-State Quantum Processes 

According to Prop. [TU] the symbol distribution gener- 
ated by a QG only depends on the dimension of the pro- 
jection operator and the dimension of the Hilbert space. 
What are the consequences for two-state QGs? First of 
all, according to Cor. [5] the maximum alphabet size is 2. 
The corresponding projection operators can either have 
dimension 2 (for a single-letter alphabet) or dimension 
1 for a binary alphabet. The only symbol probabilities 
possible are Pr(y) = 1 for the single-letter alphabet and 
Pr(y) = 1/2 for a binary alphabet. So one can set aside 
the single-letter alphabet case as too simple. 

We also see that a binary-alphabet two-state QDG can 
produce only a highly restricted set of process languages. 
It is illustrative to look at the possible equivalent SDGs. 
Their state-to-state transition matrices are given by 

withpe {0,1/2,1}. 

For p = 1/2, for example, this is the fair coin process. 
It becomes immediately clear that the Golden Mean and 
the Even processes, which are modeled by two-state clas- 
sical automata, cannot be represented with a two-state 
QDG. (The three-state models are given below.) 



1. Iterated Beam Splitter 

Let's consider a physical two-state process and build a 
quantum generator for it. 

The iterated beam splitter is an example that, despite 
its simplicity, makes a close connection with real experi- 
ment. Figure [5] shows the experimental apparatus. Pho- 
tons are sent through a beam splitter (thick dashed line), 
producing two possible paths. The paths are redirected 
by mirrors (thick horizontal solid lines) and recombincd 
at a second beam-splitter. From this point on the same 
apparatus is repeated indefinitely to the right. After the 
second beam-splitter there is a third and a fourth and 
so on. Single-photon quantum nondemolition detectors 
are located along the paths, between every pair of beam- 
splitters. One measures if the photon travels in the upper 
path and the other determines if the photon follows the 
lower path. 

This is a quantum dynamical system: a photon passing 
repeatedly through various beam splitters. It has a two- 
dimensional state space with two eigenstates — "above" 
and "below" . Its behavior is given by the evolution of a 
state vector (ip\. The overall process can be represented 
in terms of a unitary operation for the beam splitter and 
projection operators for the detectors. The unitary op- 
erator for the beam splitter is the Hadamard matrix Uh- 

"-TsO-O- (40) 
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FIG. 6: Experimental set-ups for the iterated beam splitter: 
Solid lines are mirrors; beam splitters, horizontal dashed lines. 
Photon nondemolition detectors, marked as D, are placed be- 
tween every pair of beam splitters. Under measurement pro- 
tocol I all detectors are in operation; under protocol II only 
the solid- line detectors are activated. The apparatus is re- 
peated indefinitely to the right. 



The measurement operators have the following matrix 
representation in the experiment's eigenbasis: 



P(0) 



1 




and P(l) 




1 



(41) 



where the measurement symbol stands for "above" and 
symbol 1 stands for "below" . 

Before we turn to constructing a quantum finite-state 
generator to model this experiment we can understand 
intuitively the sequence of outcomes that results from 
running the experiment for long times. If entering the 
beam splitter from above, the detectors record the pho- 
ton in the upper or lower path with equal probability. 
Once the photon is measured, though, it is in that de- 
tector's path with probability 1. And so it enters the 
beam splitter again via only one of the two possible 
paths. Thus, the second measurement outcome will have 
the same uncertainty as the first: the detectors report 
"above" or "below" with equal probability. The resulting 
sequence of outcomes after many beam splitter passages 
is simply a random sequence. Call this measurement pro- 
tocol I. 

Now consider altering the experiment slightly by re- 
moving the detectors after every other beam splitter. In 
this configuration, call it protocol II, the photon enters 
the first beam splitter, does not pass a detector and in- 
terferes with itself at the next beam splitter. That inter- 
ference, as we will confirm shortly, leads to destructive 
interference of one path after the beam splitter. The pho- 
ton is thus in the same path after the second beam split- 
ter as it was before the first beam splitter. A detector 
placed after the second beam splitter therefore reports 
with probability 1 that the photon is in the upper path, 
if the photon was initially in the upper path. If it was 
initially in the lower path, then the detector reports that 
it is in the upper path with probability 0. The resulting 
sequence of upper-path detections is a very predictable 
sequence, compared to the random sequence from proto- 
col I. 

We now construct a QG for the iterated-beam splitter 



using the matrices of Eqs. (|40| - (|4Tj) and the stationary 
state of Eq. (|3"2")l . The output alphabet consists of two 
symbols denoting detection "above" or "below": Y = 
{0, 1}. The set of states consists of the two eigenstates 
of the system "above" and "below": Q = {A, B}. The 
transition matrices are: 



T(0) = U H P(0) 



T(l) = U H P(1) 





'l 




v 1 




'0 1 







(42a) 
(42b) 



The resulting QG turns out to be deterministic, as can be 
seen from its graphical representation, shown in Fig. [7J 

The word distribution for the process languages gen- 
erated by protocols I and II are obtained from Eq. (|37[) . 
Word probabilities for protocol I (measurement at each 
time step) are, to give some examples: 



Pr(0) 


= n- 1 dim(F(0)) = i , 


(43a) 


Pr(l) 


= n-\\im(P(l)) = ~ , 


(43b) 


Pr(00) 


= tr (rt(0)Tt(0)p s T(0)T(0)) = j , 


(43c) 


Pr(01) 


= Pr(10) = Pr(ll) = i . 


(43d) 



Continuing the calculation for longer words shows that 
the word distribution is uniform at all lengths Pi(y L ) = 
2- L . 

For protocol II (measurement every other time step) 
we find: 



Pr(0) =tr (T t (A0)p s T(A0)) 
Pr(l) =tr (T t (Al)p s T(Al)) 
Pr(00) = tr (T t (A0A0)p s T(A0A0)) 



1 

2 ' 
1 

2 " 



(44a) 
(44b) 
(44c) 



Pr(ll) = tr (TT(AlAl)p s T(AlAl)) = - , (44d) 



Pr(10) = Pr(01) = 



(44e) 



If we explicitly denote the output at the unmeasured time 
step as A, the sequence 11 turns into A1A1, as do the 
other sequences in protocol II. As one can see, the word 
probabilities calculated from the QDG agree with our 
earlier intuitive conclusions. 

Comparing the iterated beam splitter QDG to its clas- 
sically equivalent SDG reveals several crucial differences 
in performance. Following the recipe from Sec. IVII El on 
how to build an SDG from a QDG, gives the classical 
generator shown in Fig.[8ja). Its transition matrices are: 



T(0) 



1 
1 



and T(l) 



1/01 
2 V 1 



(45) 



The symbol sequence generated by this SDG for proto- 
col I is the uniform distribution for all lengths, as can be 
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J_l 

V2 



JL| 
V2 1 



A 



V2 1 




B 



jLi 
V2 1 



FIG. 7: Quantum finite-state machine for the iterated beam 
splitter: The resulting symbol sequences are statistically iden- 
tical to the sequences obtained with the measurement proto- 
cols I and II shown in Fig.[S] When no measurement is made, 
transitions along all edges occur. 



(a) 




A 




B 



(b) 



1 10 




A 




B 



1 1 1 



FIG. 8: Classical deterministic generators for the iterated 
beam splitter: (a) Protocol I and (b) protocol II, p = 2. (Cf. 
Fig. El) 



easily verified using Eq. (|14[) or, since it is deterministic, 
Eq. (TT5|) . This is equivalent to the language generated 
by the QDG under protocol I. However, the probability 
distribution of the sequences for the generator under pro- 
tocol II, ignoring every second output symbol, is still the 
uniform distribution for all lengths L. This could not be 
more different from the language generated by the QDG 
under protocol II. 

The reason is that the classical machine is unable to 
capture the interference effects present in experimental 
protocol II. A second SDG has to be constructed from the 
QDG's transition matrices for set-up II. This is done by 
carrying out the matrix product first and then forming 
its equivalent SDG. The result is shown Fig. [5fb). Its 
transition matrices are: 



T(0) 



1/10 
2 I 



and T(l) = 



1/00 
1 



(46) 



The two classical SDGs are clearly (and necessarily) 
different. Thus, a single QG can model a quantum 
system's dynamics for different measurement protocols. 
Whereas an SG only captures the behavior of each indi- 
vidual experimental set-up. This simple example serves 
to illustrate the utility of QGs over SGs in modeling the 
behavior of quantum dynamical systems. 



B. Three-State Quantum Processes 

1. Golden Mean Quantum Machine 

Recall the classical Golden Mean generator of 
Sec. IIVBI A QDG, which generates the same process 
language, is shown in Fig. [9] Consider a spin-1 particle 
subject to a magnetic field that rotates its spin. The 
state evolution can be described by the unitary matrix 



U = 



-4= -4= 

73 \/2 U 

0-1 



(47) 



V2 V2 







which is a rotation in M 3 around the y-axis by angle ? 
followed by a rotation around the x-axis by ? . 

Using a suitable representation of the spin operators 



Ji [H, p. 199], such as: J x = ^00? 

and J z — f -<oo ] , the relation P, 
000 



. i 
J„ = I 000 



1 — Jf defines a 



one-to-one correspondence between the projector Pj and 
the square of the spin component along the i-axis. This 
measurement poses the yes-no question, Is the square 
of the spin component along the i-axis zero? Con- 
sider measuring Jy. Then U, the projection operator 
P(0) = 1 100) (100| + |001) (001| for y-component zero, and 
that for nonzero y-component P(l) = |010) (010|, define 
a quantum generator whose outputs are a sequence of the 
spin's y-component. 

The transition matrices T(y) are then 











T(0) = UP(Q) = 



T(l) = UP{\) = 



V2 


75 







1 

V2 

0-1 

-A 



(48a) 



(48b) 



To illustrate that this QDG produces the Golden Mean 
word distribution we show how to calculate several of the 
word probabilities using Thm. [T0l and Eq. ([57)) : 



Pr(0) = n _1 dim(P(0)) 



1 

3 " 
2 



Pr(l) = n"Mim(P(l)) = - , 

3 

Pr(Oll) = tr (Tt(011)/J*r(011)) = -• 



(49a) 



(49b) 



2. Quantum Even Process 

The next example is a quantum representation of the 
Even Process. Consider the same spin-1 particle. This 
time the component is chosen as observable. Then 
U and P(0) = 1 100) (100| and P(l) = |Q11) (011| define 
a quantum finite-state generator. The QDG is shown in 
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FIG. 9: Quantum generator for the Golden Mean Process. 

1/V2 I 




FIG. 11: Process language of the Even QDG. 



terference is possible and it generates the Golden Mean 
process language. The potential non-Markovian (sofic) 
nature of these quantum processes has been discussed in 
Ref. 

This very limited number of possible QGs for any given 
unitary matrix is yet another indication of the limitations 
of QGs. Classical SGs do not have the same structural 
restrictions, since they are not bound by orthogonal par- 
titioning into subspaces, for example. The saving grace 
for QGs is that they have complex transition amplitudes 
and so can compute with phase, as long as they are not 
observed. This is reflected in the distinct languages gen- 
erated by one QG under different measurement protocols 
0. 



FIG. 10: Quantum generator for the Even Process. 

Fig. [lOl The word distributions for lengths up to L = 9 
are shown in Fig. [TTJ 

Note that the unitary evolution for the Golden Mean 
Process and the Even Process are the same, just as the 
state-to-state transition matrices were the same for their 
classical versions. The partitioning into subspaces in- 
duced by the projection operators leads to the (substan- 
tial) differences in the word distributions; cf. Figs. [T] 
versus ITT1 

The dependence on subspace partitioning indicates 
a way to count the number of QGs for each unitary 
evolution U . For 3-dimensional Hilbert spaces this is 
rather straightforward. For each unitary matrix and 
with a binary alphabet we have three choices for par- 
titioning subspaces of the Hilbert space: one subspace 
is two-dimensional and the others one-dimensional. This 
yields three QGs that are distinct up to symbol exchange 
(0 «-> 1). For the unitary matrix that generates the 
Golden Mean and the Even Process (Eq. (|T7)) ) the third 
QG turns out to be nondeterministic. But no phase in- 



C. Four-State Quantum Process 

We are now in a position to explore the full capabilities 
of QTs, turning from generators to transducers. The 
following example illustrates quantum machines by using 
the tools required to investigate information processing 
of quantum dynamical systems. 



1 . Quantum Transducer for Trapped Ions 

Consider an atom exposed to short wavelength 
radiation — the core of numerous experiments that inves- 
tigate electronic structure and dynamics. The usual pro- 
cedure is a one-time experiment, exposing the atom to 
radiation and monitoring changes in structure through 
electron or photon detectors. As a particular set-up we 
choose ion-trap experiments found in low-temperature 
physics and quantum computation implementations, as 
described in Ref. [7|. For our present purposes it will be 
sufficient to review the general physical setting. 
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FIG. 12: Schematic view of two vibrationally-coupled trapped 
ions undergoing electronic excitation. Only the two electronic 
levels of interest are drawn. 
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(50a) 



(50b) 



(50c) 



Imagine a pair of ions kept in a trap by laser fields and 
static electromagnetic fields. Only two of the electronic 
levels of each ion are of interest: the ground state and an 
excited state. Call these level and level 1, respectively. 
A third auxiliary level is required for laser cooling and 
other operations, which we leave aside here since it has 
no significance for the description of the process. The two 
ions are coupled to each other through phonon exchange, 
as shown schematically in Fig. [T2J 

By choosing suitable wavelengths several distinct op- 
erators can be implemented. One of them is a Hadamard 
operator that produces a superposition of electronic 
states |0) and |1). Another is a phase operator that yields 
an entangled state of the two ions. The respective laser 
pulses, so-called Rabi pulses, induce an electronic exci- 
tation and a vibrational excitation. The result is vibra- 
tional coupling of the four levels. All other operations 
are combinations of these two; see Ref. Q- The opera- 
tors are named U a , Ub, and U c ; matrix representations 
are given below. As is already familiar from the iterated 
beam splitter, the operators are activated repeatedly one 
after the other in a closed loop and as such constitute a 
quantum dynamical system. 

To model the quantum dynamical system the state vec- 
tor and operator matrices need to be specified. The four 
basis states spanning the Hilbert space are given by: 



t> A \ = (iooo| , (4> B \ 

b D \ = (0001| . 



(0100|, (cf>c\ = (0010|, 



<j)A corresponds to both ions in electronic state |0). 4>b 
corresponds to ion 1 in state |0) and ion 2 in state |1), 
and so on. The three unitary operations in matrix form 



The projection operators are chosen to measure the 
electronic state of ion 1 only and have the matrix form: 



P(0) 



10 

10 







and P(l) = 







10 

1 



(51) 



The QT is now easily assembled. The set of states and 
the input and output alphabets are, respectively: Q = 
{A,B,C,D}, X = {a,b,c}, and Y = {0,1}. This QT's 
graph is shown in Fig. 1131 

To illustrate its operation we consider two measure- 
ment protocols. For each we use input sequence (abc) + . 

• Measurement protocol I: Measure ion 1 after each 
unitary operation. The resulting state vector evo- 
lution is: 



{i H+1 \ = {i H \U a P{y) , 
(ij t+2 \ = (iP t+1 \UbP(y) , 
(Tp t+s \ = (iP t+2 \U c P(y) . 



(52a) 
(52b) 
(52c) 



• Measurement protocol II: Measure ion 1 only after 
three unitary operations. This leads to evolution 
according to 



(^+3 1 = (iJt\U a UbU c P{y) . 



(53) 



The probability distributions of the observed sequences 
are shown in Figs.[THand[T51 The two distributions differ 
substantially. On the one hand, protocol II simply yields 
the process language of alternating 0s and Is. Protocol 
I, on the other hand, yields a much larger set of allowed 
words. In particular, it is striking that supp V 11 is for- 
bidden behavior under protocol I. The words 0101 and 
1010 are forbidden under protocol I, whereas they are the 
only allowed words of length L = 4 under protocol II. 

Not only does this example illustrate that a simple 
change in measurement protocol leads to a substantial 
change in the observed dynamics. It is also not clear 
a priori when a more complicated behavior is to be ex- 
pected. That is, more frequent measurement yields more 
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FIG. 13: Quantum transducer for a trapped-ion system exposed to radiation of various wavelengths. The input alphabet 
X = {a, b, c} and output alphabet Y — {0, 1} represent unitary operations and electronic states, respectively. 
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FIG. 14: Process language generated by the trapped-ion FIG. 15: The generated process languages of the trapped-ion 
quantum dynamical system of Fig. [T2] for protocol I (mea- dynamical system from Fig. Q2] for measurements performed 



surements performed at each time step). 



every three time steps. 



complicated behavior. Without quantifying how complex 
that complicated behavior is, it turns out that it is not 
always the longer period of coherent, unperturbed uni- 
tary evolution that yields more complex processes. This 
will have consequences for feasible implementations of 
quantum computational algorithms. For a quantitative 
discussion of the languages generated by quantum pro- 
cesses see Ref. 
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2. Deutsch Algorithm as a Special Case 



6]. The algorithm provided an explicit example of how 
a quantum machine could be superior to a classical one. 

Consider a binary- valued function / : {1, 2, . . . , 27V} — > 
{0, 1}. Let U be the device that computes the function 
/. If we successively apply / to 1,2,..., 2N, we get a 
string x 2N of length 2N. The problem then is to find a 
true statement about x 2N by testing the following two 
properties: 

A: / is not constant: Not only 0s or only Is in x 2N . 
B: / is not balanced: Not as many 0s as Is in x 2N . 



It turns out that the trapped-ion experiment imple- 
ments a quantum algorithm first introduced by Deutsch 



If statement A is false, we can be certain that statement 
B is true and vice versa. Note that both statements can 
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1. 


Two qubits put in states 
(0| and (1|, respectively. 




= <0100 


2. 


Hadamard transform applied 
to both qubits. 


(4>i 


= {^°\(H®H) 


3. 


Operation Uf implementing 
the function f(x) is applied. 




= (-l)'W (^i|(/®7) 


4. 


Hadamard transform applied 
to the first qubit. 




= J) 


5. 


First qubit is measured. 


{^3 


P(0) 



FIG. 16: Deutsch algorithm to classify balanced and constant 
functions (N — 2) depicted as a quantum circuit. 



TABLE I: Deutsch algorithm to determine if f(x) is balanced 
or constant. H and I are the Hadamard and identity matrices, 
respectively. ® denotes the tensor product. 



be true in which case the algorithm does not reveal any- 
thing about /. Deutsch and Josza showed that a 
quantum computer can determine the true statement, ei- 
ther A or B, after only two invocations of the operation 
U , whereas a classical computer requires N + 1 calls in 
the worst case. Taking into account the computational 
steps for establishing the start state and reading out the 
result, a quantum computer can evaluate the function / 
in constant time, whereas a classical computer needs a 
time linear in N. 

To compare the algorithm with the trapped-ion dy- 
namical system, and to keep issues simple but still infor- 
mative, we use the basic version (N = 2) of the Deutsch 
algorithm of Ref. [I?], p. 32]. (Recall that in our nota- 
tion (ip\ is the state vector, not \ip) , as is common else- 
where.) Figure [T|)] shows the algorithm as a quantum 
circuit. Each qubit represents one ion and occupies one 
horizontal line. The applied unitary transformations are 
shown as boxes. The overall procedure is summarized in 
Table HI The unitary operations H and Uf in Fig. [TBI are 
the same as H and Ub in the trapped-ion experiment. 
The unitary operator is that for a balanced function. 

The implementation of the Deutsch algorithm is equiv- 
alent to the trapped-ion system under measurement pro- 
tocol II, with Ub chosen accordingly. Measuring ion 1 af- 
ter three time steps delivers the desired answer as output 
(0=A or 1=B). Thus, the Deutsch algorithm corresponds 
to the trapped-ion system running for three time steps. 

The Deutsch algorithm task is solved with a consid- 
erable speed-up compared to a classical implementation. 
Our approach is an extension of this that focuses on what 
type of computation is carried out intrinsically by the sys- 
tem under continuous external driving and observation. 
The answer is found in the language diagrams in Figs. 1141 
and 1151 Comparing these two different views of quan- 
tum information manipulation — designed quantum com- 
puting versus quantum intrinsic computation — suggests 
that the analysis of NMR experiments with single atoms 
or molecules in terms of quantum finite-state machines 
will be a straightforward extensions of the preceding anal- 
ysis of the Deutsch algorithm. 



IX. CONCLUDING REMARKS 

We developed a line of inquiry complementary to both 
quantum computation and quantum dynamical systems 
by investigating intrinsic computation in quantum pro- 
cesses. Laying the foundations for a computational per- 
spective on quantum dynamical systems, we introduced 
quantum finite-state transducers. Residing at the lowest 
level of a quantum computational hierarchy, it is the most 
general representation of a finitary quantum process. It 
allows for a quantitative description of intrinsic compu- 
tation in quantum processes — in terms of the number of 
internal states and allowed transitions and the process 
language it generates. As far as we are aware, this has 
not been developed before in the quantum setting. 

We laid out the mathematical foundations of these 
models and developed a hierarchy of classical (stochas- 
tic) and quantum machines in terms of the set of process 
languages they recognize or generate. In many cases it 
turned out that quantum devices were less powerful than 
their classical analogs. We saw that the limitations of 
quantum finite-state machines originate in the unitarity 
of the transition matrices. This suggested that QTs, be- 
ing reversible, are less powerful than nonreversible classi- 
cal automata, since the reversibility constrains the tran- 
sition matrices. 

However, one must be careful to not over-interpret this 
state of affairs. It has been known for some time that 
any universal computation can be implemented in a re- 
versible device [5l| . Typically, this requires substantially 



more resources, largely to store outcomes of intermedi- 
ate steps. In short, reversibility does not, in general, 
imply less power for classical computers. At the end 
of the day, computational resources are variables that 
trade-off against each other. The 2-state QDG examples 
of the Beam Splitter process illustrated such a trade-off. 
Although the QDG needs more states than the equiva- 
lent SDG to generate the same process language, differ- 
ent measurement protocols yielded a new set of process 
languages — an aspect that makes QDGs more powerful 
than SDGs 

These results were then applied to physical systems 
that could be analyzed in terms of the process languages 
they generate. One example, that of two trapped ions 
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exhibited a process language with rich structure. This, 
and the fact that the system implements a quantum algo- 
rithm, opens up a way to an information-theoretic analy- 
sis of quantum processes. One can begin to analyze quan- 
tum algorithms in terms of their information processing 
power and do so independent of particular physical im- 
plementations. 

In particular, we have used quantum machines to de- 
fine a measure of intrinsic computation for quantum 
dynamical systems [H, [2l|. The basic questions one 
asks about a dynamical system's intrinsic computation — 
amount of historical information stored, storage archi- 
tecture, and transformations of stored information — can 
now be posed for quantum systems. 

Furthermore, we are developing an extension of quan- 
tum machines that supports more general types of mea- 
surement. The resulting quantum transducers are ex- 
pected to have greater power than the current versions, 
possibly even greater than stochastic transducers. Gen- 



erally, we hope that ways to integrate quantum compu- 
tation and quantum dynamics will receive further atten- 
tion. 
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